Turnbull China Bikeride

home *** CD-ROM | disk | FTP | other *** search

/ Turnbull China Bikeride / Turnbull China Bikeride - Disc 2.iso / STUTTGART / TEMP / GNU / flex / Patterns < prev next >

Wrap

Text File | 1995-06-28 | 8KB | 304 lines

Patterns Previous: <Format=>Format> * Next: <Matching=>Matching> * Up: <Top=>!Root> #Wrap on {fH3}Patterns{f} The patterns in the input are written using an extended set of regular expressions. These are: #Indent +4 #Indent {fEmphasis}x{f} #Indent +4 match the character {fEmphasis}x{f} #Indent {fEmphasis}.{f} #Indent +4 any character (byte) except newline #Indent {fEmphasis}[xyz]{f} #Indent +4 a "character class"; in this case, the pattern matches either an {fEmphasis}x{f}, a {fEmphasis}y{f}, or a {fEmphasis}z{f} #Indent {fEmphasis}[abj-oZ]{f} #Indent +4 a "character class" with a range in it; matches an {fEmphasis}a{f}, a {fEmphasis}b{f}, any letter from {fEmphasis}j{f} through {fEmphasis}o{f}, or a {fEmphasis}Z{f} #Indent {fEmphasis}[^A-Z]{f} #Indent +4 a "negated character class", i.e., any character but those in the class. In this case, any character EXCEPT an uppercase letter. #Indent {fEmphasis}[^A-Z\\n]{f} #Indent +4 any character EXCEPT an uppercase letter or a newline #Indent {fEmphasis}{fStrong}r{f}\*{f} #Indent +4 zero or more {fStrong}r{f}'s, where {fStrong}r{f} is any regular expression #Indent {fEmphasis}{fStrong}r{f}+{f} #Indent +4 one or more {fStrong}r{f}'s #Indent {fEmphasis}{fStrong}r{f}?{f} #Indent +4 zero or one {fStrong}r{f}'s (that is, "an optional {fStrong}r{f}") #Indent {fEmphasis}{fStrong}r{f}\{2,5\}{f} #Indent +4 anywhere from two to five {fStrong}r{f}'s #Indent {fEmphasis}{fStrong}r{f}\{2,\}{f} #Indent +4 two or more {fStrong}r{f}'s #Indent {fEmphasis}{fStrong}r{f}\{4\}{f} #Indent +4 exactly 4 {fStrong}r{f}'s #Indent {fEmphasis}\{{fStrong}name{f}\}{f} #Indent +4 the expansion of the "{fStrong}name{f}" definition (see above) #Indent {fEmphasis}"[xyz]\\"foo"{f} #Indent +4 the literal string: {fEmphasis}[xyz]"foo{f} #Indent {fEmphasis}\\{fStrong}x{f}{f} #Indent +4 if {fStrong}x{f} is an {fEmphasis}a{f}, {fEmphasis}b{f}, {fEmphasis}f{f}, {fEmphasis}n{f}, {fEmphasis}r{f}, {fEmphasis}t{f}, or {fEmphasis}v{f}, then the ANSI-C interpretation of \\{fStrong}x{f}. Otherwise, a literal {fEmphasis}{fStrong}x{f}{f} (used to escape operators such as {fEmphasis}\*{f}) #Indent {fEmphasis}\\0{f} #Indent +4 a NUL character (ASCII code 0) #Indent {fEmphasis}\\123{f} #Indent +4 the character with octal value 123 #Indent {fEmphasis}\\x2a{f} #Indent +4 the character with hexadecimal value {fCode}2a{f} #Indent {fEmphasis}({fStrong}r{f}){f} #Indent +4 match an {fStrong}r{f}; parentheses are used to override precedence (see below) #Indent {fEmphasis}{fStrong}r{f}{fStrong}s{f}{f} #Indent +4 the regular expression {fStrong}r{f} followed by the regular expression {fStrong}s{f}; called "concatenation" #Indent {fEmphasis}{fStrong}r{f}|{fStrong}s{f}{f} #Indent +4 either an {fStrong}r{f} or an {fStrong}s{f} #Indent {fEmphasis}{fStrong}r{f}\/{fStrong}s{f}{f} #Indent +4 an {fStrong}r{f} but only if it is followed by an {fStrong}s{f}. The text matched by {fStrong}s{f} is included when determining whether this rule is the {fUnderline}longest match{f}, but is then returned to the input before the action is executed. So the action only sees the text matched by {fStrong}r{f}. This type of pattern is called {fUnderline}trailing context{f}. (There are some combinations of {fEmphasis}{fStrong}r{f}\/{fStrong}s{f}{f} that {fCode}flex{f} cannot match correctly; see notes in the Deficiencies \/ Bugs section below regarding "dangerous trailing context".) #Indent {fEmphasis}^{fStrong}r{f}{f} #Indent +4 an {fStrong}r{f}, but only at the beginning of a line (i.e., which just starting to scan, or right after a newline has been scanned). #Indent {fEmphasis}{fStrong}r{f}${f} #Indent +4 an {fStrong}r{f}, but only at the end of a line (i.e., just before a newline). Equivalent to "{fStrong}r{f}\/\\n". Note that flex's notion of "newline" is exactly whatever the C compiler used to compile flex interprets '\\n' as; in particular, on some DOS systems you must either filter out \\r's in the input yourself, or explicitly use {fStrong}r{f}\/\\r\\n for "r$". #Indent {fEmphasis}<{fStrong}s{f}>{fStrong}r{f}{f} #Indent +4 an {fStrong}r{f}, but only in start condition {fStrong}s{f} (see below for discussion of start conditions) <{fStrong}s1{f},{fStrong}s2{f},{fStrong}s3{f}>{fStrong}r{f} same, but in any of start conditions {fStrong}s1{f}, {fStrong}s2{f}, or {fStrong}s3{f} #Indent {fEmphasis}<\*>{fStrong}r{f}{f} #Indent +4 an {fStrong}r{f} in any start condition, even an exclusive one. #Indent {fEmphasis}<<EOF>>{f} #Indent +4 an end-of-file <{fStrong}s1{f},{fStrong}s2{f}><<EOF>> an end-of-file when in start condition {fStrong}s1{f} or {fStrong}s2{f} #Indent Note that inside of a character class, all regular expression operators lose their special meaning except escape ('\\') and the character class operators, '-', ']', and, at the beginning of the class, '^'. The regular expressions listed above are grouped according to precedence, from highest precedence at the top to lowest at the bottom. Those grouped together have equal precedence. For example, #Wrap off #fCode foo|bar\* #f #Wrap on is the same as #Wrap off #fCode (foo)|(ba(r\*)) #f #Wrap on since the '\*' operator has higher precedence than concatenation, and concatenation higher than alternation ('|'). This pattern therefore matches {fEmphasis}either{f} the string "foo" {fEmphasis}or{f} the string "ba" followed by zero-or-more r's. To match "foo" or zero-or-more "bar"'s, use: #Wrap off #fCode foo|(bar)\* #f #Wrap on and to match zero-or-more "foo"'s-or-"bar"'s: #Wrap off #fCode (foo|bar)\* #f #Wrap on In addition to characters and ranges of characters, character classes can also contain character class {fUnderline}expressions{f}. These are expressions enclosed inside {fEmphasis}[{f}: and {fEmphasis}:{f}] delimiters (which themselves must appear between the '[' and ']' of the character class; other elements may occur inside the character class, too). The valid expressions are: #Wrap off #fCode [:alnum:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:] #f #Wrap on These expressions all designate a set of characters equivalent to the corresponding standard C {fEmphasis}isXXX{f} function. For example, {fEmphasis}[:alnum:]{f} designates those characters for which {fEmphasis}isalnum(){f} returns true - i.e., any alphabetic or numeric. Some systems don't provide {fEmphasis}isblank(){f}, so flex defines {fEmphasis}[:blank:]{f} as a blank or a tab. For example, the following character classes are all equivalent: #Wrap off #fCode [[:alnum:]] [[:alpha:][:digit:] [[:alpha:]0-9] [a-zA-Z0-9] #f #Wrap on If your scanner is case-insensitive (the {fEmphasis}-i{f} flag), then {fEmphasis}[:upper:]{f} and {fEmphasis}[:lower:]{f} are equivalent to {fEmphasis}[:alpha:]{f}. Some notes on patterns: #Indent +4 - A negated character class such as the example "[^A-Z]" above {fEmphasis}will match a newline{f} unless "\\n" (or an equivalent escape sequence) is one of the characters explicitly present in the negated character class (e.g., "[^A-Z\\n]"). This is unlike how many other regular expression tools treat negated character classes, but unfortunately the inconsistency is historically entrenched. Matching newlines means that a pattern like [^"]\* can match the entire input unless there's another quote in the input. - A rule can have at most one instance of trailing context (the '\/' operator or the '$' operator). The start condition, '^', and "<<EOF>>" patterns can only occur at the beginning of a pattern, and, as well as with '\/' and '$', cannot be grouped inside parentheses. A '^' which does not occur at the beginning of a rule or a '$' which does not occur at the end of a rule loses its special properties and is treated as a normal character. The following are illegal: #Wrap off #fCode foo\/bar$ <sc1>foo<sc2>bar #f #Wrap on Note that the first of these, can be written "foo\/bar\\n". The following will result in '$' or '^' being treated as a normal character: #Wrap off #fCode foo|(bar$) foo|^bar #f #Wrap on If what's wanted is a "foo" or a bar-followed-by-a-newline, the following could be used (the special '|' action is explained below): #Wrap off #fCode foo | bar$ \/\* action goes here \*\/ #f #Wrap on A similar trick will work for matching a foo or a bar-at-the-beginning-of-a-line. #Indent